phar: cap decompression output against declared uncompressed_filesize#21806
phar: cap decompression output against declared uncompressed_filesize#21806iliaal wants to merge 1 commit intophp:PHP-8.4from
Conversation
| if (SUCCESS != php_stream_copy_to_stream_ex(phar_get_entrypfp(entry), ufp, entry->compressed_filesize, NULL)) { | ||
| spprintf(error, 4096, "phar error: internal corruption of phar \"%s\" (actual filesize mismatch on file \"%s\")", phar->fname, entry->filename); | ||
| php_stream_filter_remove(filter, 1); | ||
| return FAILURE; | ||
| size_t remaining = entry->compressed_filesize; | ||
| while (remaining > 0) { | ||
| size_t chunk = remaining < 8192 ? remaining : 8192; | ||
| size_t n = 0; | ||
| if (SUCCESS != php_stream_copy_to_stream_ex(phar_get_entrypfp(entry), ufp, chunk, &n)) { | ||
| goto decompression_error; | ||
| } | ||
| if (n == 0) { | ||
| break; | ||
| } | ||
| remaining -= n; | ||
| php_stream_filter_flush(filter, 0); | ||
| if (php_stream_tell(ufp) - loc > (zend_off_t) entry->uncompressed_filesize) { | ||
| goto decompression_error; | ||
| } | ||
| } |
There was a problem hiding this comment.
I don't understand this whole logic not the point of it. Surely it suffices to pass the size_t n as a pointer and check if the copy succeeds and the size is the same as the compressed_filesize. No need for this looping nonsense?
There was a problem hiding this comment.
Ouch, "looping nonsense" 😄
The n returned by php_stream_copy_to_stream_ex is bytes read from the source (see *len = haveread at main/streams/streams.c:1831), not bytes written to the destination after the decompression filter runs. Checking n == compressed_filesize just confirms we consumed the declared compressed length; it tells us nothing about how much decompressed data landed on ufp.
The underlying issue is amplification: a phar entry declares e.g. compressed_filesize=200, uncompressed_filesize=1024, but those 200 bytes decompress to gigabytes. The pre-patch code already had a post-copy size check (php_stream_tell(ufp) - loc != uncompressed_filesize), but it only fires after the single bulk copy has already flushed the full decompressed payload to tmpfile. The goal of this patch is to bound what hits disk, not to detect the mismatch.
The only signal that bounds filter output is the write position on ufp. The chunked loop checks php_stream_tell(ufp) - loc > uncompressed_filesize after each 8 KiB chunk, with php_stream_filter_flush(filter, 0) per iteration so tell() reflects the filter output rather than still-buffered bytes. This looked like the simplest solution, but maybe there are others; a single copy_to_stream_ex with an n check won't catch this.
There was a problem hiding this comment.
The description of _php_stream_copy_to_stream_ex says that it sets *len to the number of bytes moved. So are you telling me this is a lie? Or is the problem is that we use a stream filter to do the decompression (which is maybe the issue...).
I'm not sure if I see the point in preventing to hit the disk if we are only writing to a temporary file. Unless this is somehow a security issue this seems to do extra work, complicate the code, for little benefits.
There was a problem hiding this comment.
The docstring isn't lying, but with a write filter on the destination, "bytes moved" means bytes fed into the filter (== compressed_filesize on success), not bytes the filter emits to the underlying file. Your 2nd point is the right one: because decompression runs as a filter, *len can't report post-filter output, so the caller has no way to learn the amplified size from a single copy_to_stream_ex call.
On the security side: technically a DoS from a malicious phar is possible here, though a low-quality one given the lack of persistence. More of a bug with security implications than an advisory imho, hence filed as a bug. A phar entry with compressed_filesize=200, uncompressed_filesize=1024 whose 200 bytes decompress to several GiB will write several GiB to the tmpfile before the existing post-copy check fires. An attacker can exhaust /tmp or swap with a handful of KiB of input against any service that opens entries from an untrusted phar (composer-style tooling, CI pipelines scanning phars, upload processors). The "filesize mismatch" path was designed to flag corruption, not to bound amplification, so it fires after the damage.
A check that runs during the copy is still required; there's no way to bound amplification after the fact.
|
So what does this actually fix? |
|
Yes, would need have control of phar for this to matter, which as you said would be mighty risky already, hence this is not an advisory just a potential issue. Given the proposed solution with potential negative performance implication of reading things in small (typical php sized buffers). Perhaps deeper fix inside php_stream_copy_to_stream_ex could fix by checking oddities like 1x100 decompression ratio which would be very odd, etc... but probably higher complexity fix. That being said, still a concern given the amplification aspect and could in theory be chained with something else and essentially few bytes could cause some trouble. I do think this should be addressed, but I can see why a |
I agree that if we want to stop amplification attacks (which is difficult to do generally), a more general solution is desirable rather than doing this in many different places at once. |
Adds an optional max_output parameter to zlib.inflate and bzip2.decompress filters. When set, the filter tracks bytes emitted and aborts with PSFS_ERR_FATAL once the cap is exceeded, stopping decompression amplification mid-stream instead of after the full payload has landed on disk. phar_open_entry_fp() passes entry->uncompressed_filesize as the cap so a lying phar (small declared size, payload that decompresses beyond it) fails during streaming. The previous post-copy filesize mismatch check is retained to catch under-decompression. The parameter is opt-in: omitting the key keeps existing behavior for all current callers.
facb875 to
abb0784
Compare
Prevents decompression amplification in
phar_open_entry_fp().The function decompressed entries by streaming
compressed_filesizebytes through a filter into a tmpfile, then checked the output size after the full copy. A crafted phar whose compressed data expanded beyonduncompressed_filesizecould write large amounts to disk before the check ran.Replaces the single
php_stream_copy_to_stream_ex()call with an 8 KiB chunked loop that flushes the decompression filter and checks the running output size after each chunk. Aborts when output exceedsuncompressed_filesize.